Cuttle: Enabling Cross-Column Compression in Distributed Column Stores

نویسندگان

  • Hao Liu
  • Jiang Xiao
  • Xianjun Guo
  • Haoyu Tan
  • Qiong Luo
  • Lionel M. Ni
چکیده

We observe that, in real-world distributed data warehouse systems, data columns from different sources often exhibit redundancy. Even though these systems can employ both general and column-oriented compression schemes to reduce the data storage pressure, such crosscolumn redundancy (CCR) is not recognized or exploited effectively. Therefore, we propose Cuttle, a column storage system that enables cross-column compression to reduce CCR. Specifically, we identify three kinds of CCR and develop a referential transformation encoding (RTE) scheme to compress multiple columns of data with CCR. Furthermore, we address the CCR selection problem and propose a greedy algorithm to generate cross-column compression schemes. Our experiments on realworld datasets show that Cuttle can further reduce data size by half after applying both the column-oriented and general compression schemes, and that the query processing performance with Cuttle is improved by 20% without any change to the application programs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scaling out Column Stores: Data, Queries, and Transactions Scaling out Column Stores: Data, Queries, and Transactions

The amount of data available today is huge and keeps increasing steadily. Databases help to cope with huge amounts of data. Yet, traditional databases are not fast enough to answer the complex analytical queries that decision makers in big enterprises ask over large datasets. This is where column stores have their field of application. Tailored to this type of on-line analytical processing (OLA...

متن کامل

Data Compression in Database Query Processing

Row-oriented databases (or “row-store”) employ data compression methods (like dictionary encoding) to reduce the I/O cost by decreasing the data sizes. However, there are two limitations on row-stores when applying data compression schemes: (1) row-stores only allow encoding one single value at a time, and (2) they have to pay the decompression cost in query processing. The above shortcomings l...

متن کامل

Buckling and failure characteristics of slender web I-column girders under interactive compression and shear

Geometric and material nonlinear behavior of slender webs in I-column girders having stocky flanges under the action of combined lateral and axial loads is investigated. Interaction curves corresponding to the application of compressive and shear loads at buckling and ultimate stages for both web plates and column sections are plotted. In addition, the effects of flange and web slenderness rati...

متن کامل

Size Effect in Compression Fracture: Splitting Crack Band

A simplified fracture-mechanics-based model of compression failure of centrically or eccentrically loaded quasi-brittle columns is presented and the size effect on the nominal strength of a column is predicted. Failure is modeled as propagation of a band of axial splitting cracks in a direction orthogonal or inclined with respect to the column axis. The maximum load is calculated from the condi...

متن کامل

Data Compression on Columnar-Database Using Hybrid Approach (Huffman and Lempel-Ziv Welch Algorithm)

Columnar Oriented Database is an enhance approach to service to service the needs of Business Intelligence. A Columnar Oriented Database Management system that stores content by columns rather than the row. This type of data differs from traditional database with regards to performance, storage requirements and easy to modification of the schema. One of the major advantage of column based datab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017